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f^) Abstract 

Double hashing has recently found more common usage in schemes that use multiple hash functions. In double 
hashing, for an item x, one generates two hash values f(x) and g(x), and then uses combinations (f(x)+kg(x)) mod 
n for k = 0, 1, 2, ... to generate multiple hash values from the initial two. We first perform an empirical study 
showing that, surprisingly, the performance difference between double hashing and fully random hashing appears 
1 negligible in the standard balanced allocation paradigm, where each item is placed in the least loaded of d choices, 

as well as several related variants. We then provide theoretical results that explain the behavior of double hashing in 
this context. 



1 Introduction 
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O ■ The standard balanced allocation paradigm works as follows: suppose n balls are sequentially placed into n bins, 

where each ball is placed in the least loaded of d uniform independent choices of the bins. Then the maximum load 
(that is, the maximum number of balls in a bin) is lo f lo g" + 0(1), much lower than the lo lo f g„ (1 + o(l)) obtained 
J> \ where each ball is placed according to a single uniform choice 0. 

The assumption that each ball obtains d independent uniform choices is a strong one, and a reasonable question, 
tackled by several other works, is how much randomness is needed for these types of results (see related work below). 
■ Here we consider a novel approach, examining balanced allocations in conjunction with double hashing. In the well- 

known technique of standard double hashing for open-addressed hash tables, the jth ball obtains two hash values, 
f(j) and g(j). For a hash table of size n, f(j) £ [0, n — 1] and g(j) e [1, n — 1]. Successive locations h(j, k) — 

C^j ■ fU) + %0) m °d n, k = 0, 1, 2, , are tried until an empty slot is found. 

In our context, we use the double hashing approach somewhat differently. The jth ball again obtains two hash 
values f(j) and g(j). The d choices for the jth ball are then given by h(j, k) = f(j) + kg(j) mod n, k = 0, 1, . . . , d— 
1, and the ball is placed in the least loaded. We generally assume that f(j) is uniform over [0, n — 1], g(j) is uniform 
over all numbers in [1, n — 1] relatively prime to n, and all hash values are independent. (It is convenient to consider n 
a prime, or take n to be a power of 2 so that the g(j) are uniformly chosen random odd numbers, to ensure the h(j, k) 
are distinct.) 

It might appear that limiting the space of random choices available to the balls in this way might change the 
behavior of this random process significantly. We show, empirically, that this is not the case; surprisingly, the difference 
between d fully independent choices and d choices using double hashing appears negligible for sufficiently large nQ 
As a starting example, Table[T]below shows the fraction of bins of load x for various x taken over 10000 trials, with 
n = 2 14 balls thrown into n bins using d = 3 and d = 4 choices, using both double hashing and fully random hash 
values (where our proxy for "random" is simply generating random values using the drand48 function in C seeded by 
time). Most values are given to five decimal places. The performance difference is essentially negligible, well within 
what one would expect simply from variance from the sampling process. 

More extensive empirical results appear in Section[2] In particular, we also consider two extensions to the standard 
paradigm: Vocking's extension (sometimes called d-left hashing), where the n bins are split into d subtables of size 
n/d laid out left to right, the d choices consist of one uniform independent choice in each subtable, and ties for the 
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'To be clear, we do not mean that there is no difference between double hashing and fully random hashing in this setting; there clearly is and we 
note a simple example further in the paper. As we show, analytically in the limit for large n the difference is in vanishing terms, and for finite n the 
results from our experiments suggest negligible difference in practice. 
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Fully Random 


Double Hashing 
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Fully Random 


Double Hashing 





0.17693 


0.17691 







0.14081 


0.14081 
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0.64664 


0.64670 
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0.71840 


0.71841 


2 


0.17592 


0.17589 




2 


0.14077 


0.14076 


3 


0.00051 


0.00051 




3 


2.25- 10~ 5 


2.29 • 10~ 5 



(a) 3 choices, n = 2 balls and bins 



(b) 4 choices, n — 2 balls and bins 



Table 1: An initial example showing the performance of double hashing compared to fully random hashing. In our 
tables, the row with load x gives the fraction of the bins that have load x over all trials. So over 10000 trials of throwing 
n = 2 14 balls into 2 14 bins using 3 choices and double hashing, the fraction of bins with load was 0.17691. 



least loaded bin are broken to the left OTl ; and the continuous variation, where the bins represent queues, and the balls 
represent customers that arrive as a Poisson process and have exponentially distributed service requirements [21 J. We 
again find empirically that replacing fully random choices with double hashing appears to only negligibly change 
performance. 

We consider theoretical arguments for why this would be the case. There are multiple methods available that can 
yield 0(log log n) bounds on the maximum load when n balls are thrown into n bins. We therefore first demonstrate 
how some previously used methods readily yield 0(log log n) bounds; this behavior is, arguably, unsurprising (at least 
in hindsight). We then examine the key question of why there is no difference in empirical results. For the case of fully 
random choices, the asymptotic fraction of bins of each possible load can be determined using fluid limit methods that 
yield a family of differential equations describing the process behavior |2H . It is not clear, however, why the method 
of differential equations should apply when using double hashing, and the primary result of this paper is to explain 
why it in fact applies. We believe this resolution suggests that double hashing can be used to obtain the same results 
as fully random hashing in various hash-based structures, which may be important in practical settings. 

We argue these results are important for multiple reasons. First, we believe the fact that moving from fully random 
hashing to double hashing does not change performance is interesting in its own right. But it also has practical 
applications; multiple-choice hashing is used in several hardware systems (such as routers), and double hashing both 
requires less (pseudo-)randomness and is extremely conducive to implementation in hardware. (As we discuss below, 
it may also be useful in software systems.) Finally, as mentioned, these results suggest that using double hashing in 
place of fully random choices may similarly yield the same performance in other settings, such as for cuckoo hashing 
or in error-correcting codes, offering the same potential benefits for these problems. We explore this issue further in a 
follow-up paper l24l . 

1.1 Related Work 

The balanced allocations paradigm, or the power of two choices, has been the subject of a great deal of work, both in 
the discrete balls and bins setting and in the queueing theoretic setting. See, for example, lfl6ll23l . 

Several recent works have considered hashing variations with less randomness in place of assuming perfectly 
random hash functions; indeed, there is a long history of work on universal hash functions O, and more recently 
min-wise independent hashing [7]. Specific recent related work includes results on standard one-choice balls and bins 
problems [9|, hashing with linear probing with limited independence 11271 . and tabulation hashing |28l . Perhaps the 
most related example is work by Woelfel l33l . which shows that a variation of Vocking's results hold using simple hash 
functions that utilize a collection of fc-wise independent hash functions for small k, and a random vector requiring o(n) 
space. Our work differs from Woelfel's in multiple respects, most notably we do not require storing a large random 
vector, and we specifically show that double hashing yields essentially the same results as random hash functions. 

Another related work in the balls and bins setting is the paper of Kenthapadi and Panigrahy [14], who consider 
a setting where balls are not allowed to choose any two bins, but are forced to choose two bins corresponding to an 
edge on an underlying random graph. In the same paper, they also show that two random choices that yield d bins are 
sufficient for similar 0(log log n) bounds on maximum loads that one obtains with d fully random choices, where in 
their case each random choice gives a contiguous block of d/2 bins. 

Interestingly, the classical question regarding the average length of an unsuccessful search sequence for standard 
double hashing in an open address hash table when the table load is a constant a has been shown to be, up to lower 
order terms, 1/(1 — a), showing that double hashing has essentially the same performance as random probing (where 
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each ball would have its own random permutation of the bins to examine, in order, until finding an empty bin) when 
using traditional hash tables ll5l [121 [181 . These results appear to have been derived using different techniques than we 
utilize here; it could be worthwhile to construct a general analysis that applies for both schemes. 

A few papers have recently suggested using double hashing in schemes where one would use multiple hash func- 
tions and shown little or no loss in performance. For Bloom filters, Kirsch and Mitzenmacher ifTSl , starting from the 
empirical analysis by Dillinger and Manolios flOl . prove that using double hashing has negligible effects on Bloom 
filter performance. This result is closest in spirit to our current work; indeed, the type of analysis here can be used to 
provide an alternative argument for this phenomenon, although the case of Bloom filters is inherently simpler. Several 
available online implementations of Bloom filters now use this approach, suggesting that the double hashing approach 
can be significantly beneficial in software as well as hardware implementations |3 Bachrach and Porat use double 
hashing in a variant of min-wise independent sketches |3|. The reduction in randomness stemming from using double 
hashing to generate multiple hash values can be useful in other contexts. For example, it is used in [26 1 to improve 
results where pairwise independent hash functions are sufficient for suitably random data. Using double hashing re- 
quires fewer hash values to be generated (two in place of a larger number), which means less randomness in the data 
is required for multiple-choice hashing in the results of ||26l . 

Arguably, the main difference between our work and other related work is that in our setting with double hashing 
we find the empirical results are essentially indistinguishable in practice, and we focus on examining this phenomenon. 

2 Empirical Results 

We have done extensive simulation to test whether using double hashing in place of idealized random hashing makes 
a difference for several multiple choice schemes. Theoretically, of course, there is some difference; for example, 
the probability that k balls choose the same specified set of d bins is 0(n~ dk ) with fully random choices, and only 
0(n~ 2k ) with double hashing (where the order notation may hide factors that depend on d). Hence, to be clear, the 
best we can hope for are differences up to o(l) events. Empirically, however, our experiments suggest the effects on 
the distribution of the loads, or in particular on the probability the maximum load exceeds some value, are all found 
deeply in the lower order terms. Experiments show that unless especially rare events are of special concern, we expect 
the two to perform similarly. 

2.1 The Standard d- Choice Scheme 

We first consider n balls and bins using d choices without replacement, comparing fully random choices with double 
hashingH When using double hashing we choose an odd stride value as explained previously. All results presented 
are over 10000 trials. Table|2]shows the distributions of bin loads for 3 and 4 choices, averaged over all 10000 trials, 
for n = 2 16 and n = 2 18 . (Recall n = 2 14 was shown in Table [1]) As can be seen, the deviations are all very small, 
within standard sampling error. 

We may also consider the maximum load. In Table [3] we consider values of n where the maximum load is at most 
3, and examine the fraction of time a load of 3 is achieved over the 10000 trials. Again, the difference between the two 
schemes appears small, to the point where it would be a challenge to differentiate between the two approaches. 

A reasonable question is whether the same behavior occurs if the average load is larger than 1 . We have tested this 
for several cases, and again found that empirically the difference in behavior is negligible. As an example, Table |4] 
gives results in the case of 2 18 balls being thrown into 2 14 bins, for an average load of 16. Again, the differences are 
at the level of statistical noise. 

We note that we obtain similar results under variations of the standard d-choice scheme. For example, using 
Vocking's approach of splitting in d subtables and breaking ties to the left, we obtain similar load. Table [5] shows 
results from an exemplary case where d = 4, again averaging over 10000 trials. The case of n = 2 18 is instructive; 
this appears very close to the threshold where bins with load 3 can appear. While there appears to be a deviation, 
with double hashing have some small fraction of bins with load 3, this corresponds to exactly 2 bins over the 10000 

2 See, forexample,|http : //leveldb ■ google code . com/ svn/trunk/util /bloom. cc||https : / /git hub ■ com/armon/bloomd) 
and |http : / /hackage .haskell ■ org/packages /ar chive /bloomf ilter/1 . /doc /html /bloomfi Iter . txt| 

3 We also considered d choices with replacement, but the difference was not apparent except for very small n, so we present only results 
without placement. However, we note that conversations with George Varghese regarding hardware settings with small n originally motivated our 
examination of this approach. 
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Load 


Fully Random 


Double Hashing 





0.17695 


0.17693 
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0.64661 


0.64664 
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0.17593 


0.17592 


3 


0.00051 


0.00051 


(a) 3 choices, n = 2 16 balls and bins 
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Fully Random 


Double Hashing 





0.17696 


0.17696 
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0.64658 


0.64648 
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0.17595 


0.17595 
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0.00051 


0.00051 



(c) 3 choices, n = 2 balls and bins 
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Fully Random 


Double Hashing 





0.14081 


0.14083 
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0.71841 


0.71835 
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0.14076 


0.14079 
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2.32- 10~ 5 


2.30 • 10~ 5 


(b) 4 choices, n = 2 16 balls and bins 


Load 


Fully Random 


Double Hashing 





0.14083 


0.14082 


1 


0.71837 


0.71838 


2 


0.14078 


0.14078 


3 


2.31 • 10~ 5 


2.32- 10~ 5 



(d) 4 choices, n = 2 balls and bins 



Table 2: Negligible differences in simulation between double hashing and fully random hashing. 
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Fully Random 


Double Hashing 




n 


Fully Random 


Double Hashing 


2 lu 


39.78 


39.40 




2 lu 


2.24 


2.23 


2 11 


64.71 


65.15 




2 12 


8.91 


8.52 


2 12 


86.90 


87.05 




2 14 


30.75 


31.42 


2 13 


98.37 


98.63 




2 16 


78.23 


77.72 


2 14 


100.00 


99.99 




2 18 


99.77 


99.79 


2 15 


100.00 


100.00 




2 20 


100.00 


100.00 



(a) 3 choices, fraction with maximum load 3 (b) 4 choices, fraction with maximum load 3 



Table 3: Comparing maximum loads. The fraction of runs with maximum load 3 is similar. 



Load 


Fully Random 


Double Hashing 


9 


6.10- 10" 9 


6.10- 10" a 


10 


1.28- 10~ 7 


1.71 • 10~ 7 


11 


2.50- 10~ 6 


2.95 • 10~ 6 


12 


4.54- 10~ 5 


4.51 • 10" 5 


13 


0.00076 


0.00076 


14 


0.01254 


0.01254 


15 


0.16885 


0.16877 


16 


0.62220 


0.62234 


17 


0.19482 


0.19475 


18 


0.00079 


0.00079 



Load 


Fully Random 


Double Hashing 


11 


2.44 • 10" 8 


2.44- 10~ 8 


12 


1.48- 10~ 6 


1.34- 10~ 6 


13 


6.92 • 10" 5 


6.98- 10~ 4 


14 


0.00349 


0.00349 


15 


0.13908 


0.13906 


16 


0.71110 


0.71114 


17 


0.14622 


0.14620 


18 


2.86- 10~ 5 


2.85- 10~ 5 



(a) 3 choices, 2 18 balls and 2 14 bins 



(b) 4 choices, 2 18 balls and 2 14 bins 



Table 4: The similarity in performance persists under higher loads. 



Load 


Fully Random 


Double Hashing 





0.12420 


0.12421 
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0.75160 


0.75158 


2 


0.12420 


0.12421 



(a) 4 choices, 2 14 balls and bins 



Load 


Fully Random 


Double Hashing 





0.12421 


0.12421 


1 


0.75159 


0.75158 


2 


0.12421 


0.12421 


3 




7.63- 10~ 10 



(b) 4 choices, 2 18 balls and bins 
Table 5: Double hashing performance with Vocking's d-left scheme. 
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A 


Choices 


Fully Random 


Double Hashing 


0.9 


3 


2.02805 


2.02813 


0.9 


4 


1.77788 


1.77792 


0.99 


3 


3.85967 


3.86073 


0.99 


4 


3.24347 


3.24410 



Table 6: n = 2 queues, average time 



trials. Further simulations suggest that this apparent gap is less significant than it might appear; over 100000 trials, for 
random, the maximum load was 3 for three trials, while for double hashing, it was 3 for four trials. 

In the standard queueing setting, balls arrive as a Poisson process of rate An for A < 1 to a bank of n first-in first- 
out queues, and have exponentially distributed service times with mean 1 . Jobs are placed by choosing d queues and 
going to the queue with the fewest jobs. The asymptotic equilibrium distributions for such systems with independent, 
uniform choices can be found by fluid limit models 12T1 [32l . We ran 100 simulations of 10000 seconds, recording 
the average time over all packets after time 1000 (allowing the system to "burn in".) An example appears in Table [6] 
While double hashing performs slightly worse in these trials, the gap is far less than 0.1% in all cases. 

3 Theoretical Results 

We now consider formal arguments for the excellent behavior of double hashing. We begin with some simpler but 
coarser arguments. While our witness tree argument dominates our majorization argument, we present both, as they 
may be useful in considering future variations, and they highlight how these techniques apply in these settings. 

3.1 A Majorization Argument 

Note that using double hashing with two choices and using random hashing with two distinct hash values per ball 
are equivalent. We first present a simple argument, showing the seemingly obvious fact that using double hashing 
with d > 2 choices is at least as good as using 2 random choices. This in turn shows that double hashing maintains 
log log n + 0(1) maximum load in the standard balls and bins setting. 

Our approach uses a standard majorization and coupling argument, where the coupling links the random choices 
made by the processes when using double hashing and using random hashing while maintaining the fidelity of both 
individual processes. (See, e.g., ||2][4], or J20| for a general treatment.) We say that vector x — (xi , . . . , x n ) majorizes 

vector y = (y x , . . . , y n ) if Ya=i x i = Yn=i Vi and ' for 3 < n < Si=i x i > J2l=i Hi- For two Markovian processes X 
and Y, we say that X stochastically majorizes Y if there is a coupling of the processes X and Y so that at each step 
under the coupling the vector representing the state of X majorizes the vector representing the state of Y. Note that 
using the loads of the bins as the state, the balls and bins processes we consider are Markovian. 

Theorem 1 Let process X be the process where m balls are placed into n bins with two distinct random choices, and 
Y be the corresponding scheme with d > 2 choices using double hashing. Then X stochastically majorizes Y. 

Proof: At each time step, we let x(t) and y(t) be the vectors corresponding to the loads sorted in decreasing order. 
We inductively claim that x(t) majorizes y(t) at all time steps under the coupling of the processes where if the ath and 
feth bins in the sorted order for X are chosen, the ath and 6th bins in the sorted order for Y are chosen as the first two 
choices, and then the remaining choices are determined by double hashing. That is, the d hash choices are such that 
the gap between successive choices is 6 — a, so the choices are a, 6, 2b — a, 3b — 2a, and so on (modulo the size of 
the table). Clearly x(0) majorizes y(0) as the vectors are equal. It is simple to check that this process maintains the 
majorization; the additional choices of the double hashing process guarantees that the coordinate that increases in y (t) 
at each step is deeper in the sorted order than the coordinate that increases in x(t). This property maintains the desired 
majorization. ■ 

As two random choices stochastically majorizes d choices from double hashing under this coupling, we see that 

Pr(xi > c) > Pr(yi > c) 
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for any value c. Since the seminal result of [2| shows that using two choices gives a maximum load of log logn + 0(l) 
with high probability, we therefore have this corollary. 

Corollary 2 The maximum load using d > 2 choices and double hashing is log log n + 0(1) with high probability. 
A similar argument shows that using d choices stochastically majorizes d + 1 choices when using double hashing. 

3.2 A Witness Tree Argument 

We utilize the witness tree approach, following closely the work of Vocking 13T1 . (See also (30 | for related arguments.) 
While we discuss the case of insertions only, the arguments also apply in settings with deletions as well; see ||3T1 for 
more details. Similarly, here we consider only the standard balls and bins setting of n balls and n bins with d > 3 
being a constant, but similar results for Vocking's scheme or for m — cn balls for some constant c can also be derived. 
These methods allow us to prove statements of the following form: 

Theorem 3 Suppose n balls are placed into n bins using the balanced allocation scheme with double hashing as 
described above. Then with d choices the maximum load is log log n/ log d + 0(d) with high probability. 

We note that, while Vocking obtains a bound of log log n/ log d + 0(1), we have an O(d) term that appears 
necessary to handle the leaves in our witness tree. (A similar issue appears to arise in [33].) How we deviate from 
Vocking's argument is explained below. 

We define a witness tree, which is a tree-ordered (multi)set of balls. Each node in the tree represents a ball, inserted 
at a certain time; the ith inserted ball corresponds to time i in the natural way. The ball represented by the root r is 
placed at time t, and a child node must have been inserted at a time previous to its parent. A leaf node in Vocking's 
argument is activated if each of the d locations of the corresponding ball contains at least three balls when it is inserted. 
An edge (u, v) is activated if when v is the ith child of u, then the ith location of u's ball is the same as one of the 
location of v's ball. A witness tree is activated if all of its leaf nodes and edges are activated. 

Following Vocking's approach, we first bound the probability that a witness tree is activated for the simpler case 
where the nodes of the witness trees represent distinct balls. The argument then can be generalized to deal with witness 
trees where the same ball may appear multiple times. As this follows straightforwardly using the technical approach 
in ||3T| , we do not provide the full argument here. 

We now explain where we must deviate from Vocking's argument. The original argument utilizes the fact at most 
n/3 bins have load at least 3, deterministically. As leaf nodes in Vocking's argument are required to have all d choices 
of bins have load at least 3 to be activated, a leaf node corresponding to a ball with d choices of bins is activated 
with probability at most 3~ d . However, this argument will not apply in our case, because the choices of bins are 
not independent when using double hashing, and depending on which bins are loaded, we can obtain very different 
results. For example, consider a case where the first n/3 bins have load at least 3. The fraction of choices using double 
hashing where all d bins have load at least 3 is significantly more than 3~ rf , which would be the probability if n/3 
bins with load 3 were randomly distributed. Indeed, for a newly placed ball j, if f(j) and g(j) are both less then 
nj (3(d + 1)), all d choices will have load at least 3, and this occurs with probability at least (9(d + l) 2 ) -1 . While 
such a configuration is unlikely, the deterministic argument used by Vocking no longer applies. 

Indeed, as we shall see, we need an even stronger property. Vocking's argument uses that fact that if the witness 
tree has q leaves, they are all activated with probability at most 3~ dq ; that is, the activations of distinct leaves are 
independent in Vocking's setting. We similarly need that all leaves are all activated with probability at most 3~ dq . 

We modify the argument to deal with these issues. To begin, in our double hashing setting, let us call a leaf active 
if either 

• Some ball in the past has two or more of the bins at this leaf among its d choices. 

• All the d bins chosen by this ball have previously been chosen by 4d previous balls. 

The probability that any previous ball has hit two or more of the bins at the leaf is 0(d A n~ 1 ): there are ( d ) pairs of 
bins from the d choices at the leaf; at most d(d — 1) pairs of positions within the d choices where that pair could occur 
in any previous ball; at most n possible previous balls; and each bad choice that leads that previous ball to have a 
specific pair of bins in a specific pair of positions occurs with probability 1 / (n(n — 1)). Once we exclude this case, 
we can consider only balls that hit at most one of the d bins associated with the leaf. 
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For any time corresponding to a leaf, we bound the probability that any specific bin has been chosen by Ad or 
more previous balls. We note by symmetry that the probability any specific ball chooses a specific bin is d/n. The 
probability in question is then at most 

\4d) W " W < UJ ' 

which is less than -| whenever d > 3. Further, once we consider the case of previous balls that choose two or more 
bins at this leaf separately, the events that the d bins chosen by this ball have previously been chosen by Ad previous 
balls are negatively correlated. Hence, we find the probability a specific leaf node is activated is less than 3~ d . 

However, as mentioned, we need to consider a collection of q leaves and show the probability that they all actives 
is less than 3~ dq . We will do this below by using Azuma's inequality to show the fraction of choices of hash values 
from double hashing that lead to an activated ball is less than 3~ d with high probability. As balls corresponding to 
leaves independently choose their hash values, this result suffices. 

Let S be the set of pairs of hash values that generate d values that would activate a leaf at time n. We have 

E[|«S'|] < (f ) n(n — 1) + cd 4 (n — 1) for some constant c, so E[|<S|] > (3 d — j)n(n — 1) for some constant 7 and 
large enough n. Consider the Doob martingale obtained by revealing the bins for the balls one at a time. Each ball can 
change the final value of S by at most dn, since the bin where any ball is placed is involved in less than dn choices of 
pairs. Azuma's inequality (e.g., [25|[Section 12.5]) then yields 

Pr(E[|S|] > 3- d n(n - 1)) < exp(-dn) 

for a constant S that depends on d and 7. It follows readily that the fraction of pairs of hash values that activate a leaf 
is at most 3~ d with very high probability throughout the process; by conditioning on this event, we can continue with 
Vocking's argument. (The conditioning only adds an exponentially small additional probability to the probability the 
maximum load exceeds our bound.) 

Specifically, we note for there to be a bin of load L + Ad, there must be an activated witness tree of depth L. We 
can bound the probability that some witness tree (with distinct balls) of depth L is activated. The probability an edge is 
activated is the probability a ball chooses a specific bin, which as previously noted is d/n. As all balls are distinct, the 
probability that a witness tree of m balls has all edges activated is (d/n)" 1-1 , and as we have shown the probability of 
all leaves being activated is bounded above by 3~ dq where q — d L is the number of leaves. Following |UT| , as there 
are at most n rn ways of choosing the balls for the witness tree, the probability that there exists an active witness tree 
is at most 

m— 1 

i- dq < n-d 2q ■ 3- dq 

< n-2- q 
= n-2- d \ 

Hence choosing L < \og d log 2 n + log d (l + a) guarantees a maximum load of L + Ad with probability 0(n~ a ). 

3.3 The Fluid Limit Argument 

We now consider the fluid limit approach of [ 22 1 . The fluid limit approach gives equations that describe the asymptotic 
fraction of bins with each possible integer load, and concentration around these values follows from martingale bounds 
(e.g., lfTTl[T7l[34l ). Values can easily be determined numerically, and prove highly accurate even for small numbers 
of balls and bins. We show that the same equations apply even in the setting of double hashing, giving a theoretical 
justification for our empirical findings in Section [2] This approach can be easily extended to other multiple choice 
processes (Vocking's scheme, the queuing setting) in this paper. 

The standard balls and bins fluid limit argument runs as follows. Let Xi(t) be a random variable denoting the 
number of bins with load at least i after tn balls have been thrown; hence Xo(0) = n and Xj(0) = for all i > 1. 
Let Xi (t) = Xi (t) I n. For Xi to increase when a ball is thrown, all of its choices must have load at least i — but not 
all of them can have load at least i. Hence for i > 1 

E[Xi(t + l/n) - X l (t)] = {x l - 1 (t)) d - (x t (t)) d . 
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Let A(xi) = Xi(t + 1/n) — Xi(t) and A(t) = 1/n. Then the above can be written as: 



E 



A(t) 



In the limit as n grows, we can view the limiting version of the above equation as 

dt ~ l - x 

where we remove the t on the right hand side as the meaning is clear. Again, previous work s iflTl [171 [34l justify how 
the Markovian load balancing process converges to the solution of the differential equationsO These equations allow 
us to compute the fraction of bins of each load numerically in the limit; these results closely match our simulations, as 
for example shown in Table [7] 



Tail load 


Fluid Limit 


Fully Random 


Double Hashing 


> 1 


0.8231 


0.8231 


0.8231 


> 2 


0.1765 


0.1764 


0.1764 


> 3 


0.00051 


0.00051 


0.00051 



Table 7: 3 choices, fluid limit (n = oo) vs. n = 2 14 balls and bins 



Given our empirical results, it is natural to conclude that these differential equations must also necessarily describe 
the behavior of the process when we use double hashing in place of standard hashing. The question is how can we 
justify this, as the equations were derived utilizing the independence of choices, which is not the case for double 
hashing. 

We now prove that, for constant number of choices d, constant load values i, a constant time T (corresponding 
to Tn total balls) the probability that the bins chosen by double hashing under the standard scheme are essentially 
independent, in that 

E[Xi(t + 1/n) - Xi(t)] = (x t ^(t)) d - ( Xi (t)) d + o(l); 

that is, the gap is only in o(l) terms. The result is that the double hashing has no effect on the fluid limit analysis (where 
the o(l) terms vanish in the limit; see e.g. lfTTll34l . and specifically condition (ii) of [34|[Theorem 1]), corresponding 
to the negligible effect we see in practice. 

To see this, we consider a different type of witness tree. Here we are utilizing ideas found in the work of Bramson, 
Lue, and Prabhakar [6 1, who use a similar approach to obtain asymptotic independence results in the queueing setting, 
but there the concern was on limiting independence in equilibrium with general service time distributions. In their 
model the choices of queues were assumed to be purely random; we show that this methodology applies to the double 
hashing setting as well. 

We refer to the ancestry list of a bin b at time t as follows. The list begins with the balls z\, Z2, ■ ■ ■ , £g(&,t) that 
have had bin b as one of their choices, where g(b, t) is the number of balls that have chosen bin b up to time t. Note 
that each zt is associated with a corresponding time t{ and d — 1 other bin choices. For each zu we recursively add the 
list of balls that have chosen that bin up to time tj, and so on. It is clear that the ancestry list gives all the necessary 
information to determine the load of the bin b at time t (assuming the information regarding choices is presented in 
such a way to include how placement will occur in case of ties; e.g., the bin choices are ordered by priority). Note the 
ancestry list holds more information (and more balls and bins) than the witness trees used by Vocking. 

For asymptotic independence of the load among a collection of d bins at a specific time when a new ball is placed, 
it suffices to show that these ancestry lists are small. This implies that the ancestry lists have no bins in common with 
high probability, since for any two vertices vi and V2 the probability that the two share a vertex in their ancestry lists 
will then be dominated by the probability that they are together in some edge in the witness tree, which is 0(d 2 /n). (If 
the two vertices are not in some same edge, the probability that some other vertex appears in both trees are independent 
by symmetry.) We show that the expected size of the ancestry list is constant, and it is logarithmic in size with high 
probability. It follows that, when a new ball is inserted at some time t, the differential equations (up to vanishing o(l) 
terms) describe the limiting process. 

4 In particular, the technical conditions corresponding to [ 34 1 [Theorem 1] hold, and this theorem gives the appropriate convergence. 
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To clarify this, consider bins bi, 62, . . . , &d that were chosen by a ball at some time t+l/n. (Recall our scaling of 
time.) The probability that all d bins have load at least i at that time is equivalent to the probability that each bin bj 
has a corresponding ancestry list Aj showing that it has load i at some time Uj < t. Fix a collection of ancestry lists 
Aj, and let Ej be the event defined by "bin bj has ancestry list Aj", If these ancestry lists correspond to distinct balls 
at distinct times with no bins in common, then 

Pr(n^) = II Pr ^)- 

3 

For constant i, t, and d, the probability that all d bins have load at least i is constant. Hence, if the probability that the 
ancestry lists for the d bins intersect at any bin is o(l), we have asymptotic independence, giving 

E[Xi(t + 1/n) - Xi(t)] = (x l - 1 (t)) d - ( Xi (t)) d + o(l) 

as needed. 

To provide some intuition for why ancestry lists are "small", and hence intersect with o(l) probability, we first 
consider the setting of fully random hashing at a time T < (1 — e)/ (d(d — 1)) for an arbitrary constant e. We can view 
the bins as vertices and the balls as hyperedges among the bins they have chosen, and the random choices of the bins 
by the balls correspond to a random hypergraph on n vertices and nT < n(l — e)/ (d(d — 1)) edges of size d. Since 
each edge is adjacent to d vertices, the expected number of hyperedges adjacent to a vertex is (1 — e)/(d — 1). In this 
case, it is well known that the components of the random hypergraph have constant expected size and logarithmic size 
with high probability (see, e.g., ||29l ). This can be seen for example coupling with standard branching processes as 
follows. Start with some vertex v in the hypergraph. For each hyperedge adjacent to v, let the d — 1 other vertices 
in that edge be its children. Similarly, for every other hyperedge adjacent to these vertices, let their adjacent vertices 
be the children of the first generation of children (noting that we do not duplicate vertices so that we maintain a tree), 
and so on. This can be coupled to the natural branching process where starting from the root each vertex generates 
a Poisson distributed number of adjacent hyperedges, where the Poisson distribution has mean (1 — e)/(d— 1), and 
each such hyperedge generated d — 1 new descendants in the branching process. The expected number of children 
generated from each node is 1 — e, so the branching process dies out with a constant number of expected nodes, and 
logarithmic number of total nodes with high probability. 

This exact same coupling of the branching process applies in the case where double hashing is used. That is, even 
when only using double hashing, when starting from the root each vertex generates a Poisson distributed number of 
adjacent hyperedges, where the Poisson distribution has mean (1— e)/(d— 1). The coupling with the branching process 
pessimistically assumes that all d — 1 new descendants generated from a hyperedge are distinct and have not already 
appeared in the tree, so it does not matter that double hashing in place of random hashing; all that matters is how many 
adjacent hyperedges are generated, and in both cases this asymptotically follows the same Poisson distribution. So 
again, because the branching process dies out, when using double hashing the ancestry lists remain small. 

It follows immediately that the ancestry lists have the necessary properties even under double hashing, as the 
ancestry list for any bin is contained in a single component, and asymptotic independence follows. 

The above argument does not explain why we have asymptotic independence for larger values of T, as when 
T > l/(d(d — 1)), the corresponding random hypergraph has a giant component. Here, following [6|, we have to 
make use of the fact that as we add balls to the ancestry list, we are going backward in time. We can here couple the 
addition of balls to the list with a branching process, which runs for T time units; here time is running backward from 
the point of view of the bins. Balls join the list of each bin at a rate of d per unit time; in the limit, we can view the 
joining process as a Poisson process of this rate with no loss in fidelity. When a ball joins, d — 1 children bins join the 
ancestry list. So in the limit for large n we can view the ancestry list of a bin as a continuous d-wry branching process 
that branches at rate d per unit time over T time units. Standard results from branching theory (e.g., [Q] Q~3]) yield 
bounds on the size of the ancestry tree; the expected size is the constant e d ( d_1 ^ T , and logarithmic tail bounds follow 
(from for example martingale concentration inequalities). Again, the coupling to the branching process holds equally 
well when just using double hashing. Hence the ancestry lists are small, and we have asymptotic independence even 
for constant T > l/(d(d- 1)). 

As a result, we have the following theorem, generalizing results of [22 1 [Section 4.2.2] to the setting of double 
hashing. 

Theorem 4 Let i, d, and T be constants. Suppose Tn balls are sequentially thrown into n bins with each ball having 
d choices obtained from double hashing and each ball being placed in the least loaded bin (ties broken arbitrarily). 
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Let Xi (T) be the number of bins of load at least i after the balls are thrown. Let Xi (t) be determined by the family of 
differential equations 



dxi 
~dt 



d _ d 
•H— 1 A i ' 



where Xo(t) = 1 for all time and 2^(0) = O fori > 1. Then 



lim E 

n— >oo 



= Xi{T). 



Proof: This follows from the fact that 

E[*<(* + 1/n) - Xi(t)] = (x l - 1 (t)) d - ( Xi (t)) d + o(l), 
and applying J34] [Theorem 1]. 



4 Conclusion 

We have first demonstrated empirically that using double hashing with balanced allocation processes (e.g., the power 
of (more than) two choices), surprisingly, does not noticeably change performance when compared with fully random 
hashing. We have then shown that previous methods can readily provide O(loglogn) bounds for this approach. 
However, explaining why the fraction of bins of load k for each k appears the same requires revisiting the fluid limit 
model for such processes. We have shown, interestingly, that the same family of differential equations applies for the 
limiting process. 

This opens the door to the interesting possibility that double hashing can be suitable for other problem or analyses 
where this type of fluid limit analysis applies, such as low-density parity-check codes |fl9l . Here, however, the asymp- 
totic independence required was derived from the fact that we were looking at the history of the process, so we could 
ignore balls that arrived after a ball that joined an ancestry list, resulting in only expected constant-sized histories. 
Whether similar asymptotic independence can be derived for other problems remains to be seen. Further, for many 
problems the fluid limit analysis, while an important step, may not offer a complete analysis. So again, determining 
more generally where double hashing can be used in place of fully random hashing without significantly changing 
performance may offer challenging future questions. 
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